Bootstrapping pronunciation dictionaries: practical issues

نویسندگان

  • Marelie H. Davel
  • Etienne Barnard
چکیده

Bootstrapping techniques are an efficient way to develop electronic pronunciation dictionaries [1, 2], but require fast system response to be practical for medium-to-large lexicons. In addition, user errors are inevitable during this process, and it is useful if automatic means can be used to assist in the search for such errors. We describe how the Default&Refine grapheme-tophoneme rule extraction algorithm [3] can be adapted to meet both of these goals. Experimental results demonstrate the utility of these methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The efficient generation of pronunciation dictionaries: human factors during bootstrapping

Bootstrapping techniques have significant potential for the efficient generation of linguistic resources such as electronic pronunciation dictionaries. We describe a system and an approach to bootstrapping for the development of such dictionaries, and report on experiments conducted to investigate the efficiency and effectiveness of the system, focusing on the human factors that influence the p...

متن کامل

The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping

Several factors affect the efficiency of bootstrapping approaches to the generation of pronunciation dictionaries. We focus on factors related to the underlying rule-extraction algorithms, and demonstrate variants of the Dynamically Expanding Context algorithm, which are beneficial for this application. In particular, we show that continuous updating of the learned rules, coupled with a new app...

متن کامل

Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban

This paper describes our experiments and results on using a local dominant language in Malaysia (Malay), to bootstrap automatic speech recognition (ASR) for a very under-resourced language: Iban (also spoken in Malaysia on the Borneo Island part). Resources in Iban for building a speech recognition were nonexistent. For this, we tried to take advantage of a language from the same family with se...

متن کامل

Efficient compression method for pronunciation dictionaries

Pronunciation dictionaries are often used with other datadriven methods to model the pronunciations in phonemebased automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dictionaries usually take a great amount of memory, which is a limiting factor in portable handheld devices. Compressing the pronunciation dictionaries results in minimal transmission bandwidth and less stora...

متن کامل

Automatic Learning and Optimization of Pronunciation Dictionaries

Pronunciation dictionaries are the interface between orthographic and phonetic representation of the speech signal and are thereby a substantial component of speech recognition systems. In many systems simple canonical pronunciation forms are used within the dictionary. They represent the “correct” pronunciation as they are found in lexicons and neither contain the most frequent pronunciation n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005